git clone https://github.com/alibaba/DataX.git
cd DataX
mvn -U clean package assembly:assembly -Dmaven.test.skip=true
cd target/datax/datax/bin
python datax.py -r mysqlreader -w streamwriter
python datax.py ./mysql_2_local.json
执行最后有个统计信息
2017-10-31 20:34:16.229 [job-0] INFO JobContainer - PerfTrace not enable!
2017-10-31 20:34:16.230 [job-0] INFO StandAloneJobContainerCommunicator - Total 408147 records, 9795528 bytes | Speed 956.59KB/s, 40814 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 4.531s | All Task WaitReaderTime 0.284s | Percentage 100.00%
2017-10-31 20:34:16.231 [job-0] INFO JobContainer -
任务启动时刻 : 2017-10-31 20:34:05
任务结束时刻 : 2017-10-31 20:34:16
任务总计耗时 : 11s
任务平均流量 : 956.59KB/s
记录写入速度 : 40814rec/s
读出记录总数 : 408147
读写失败总数 : 0
得到配置文件模板
最后的配置文件
cat mysql_2_local.json
{
"job": {
"setting": {
"speed": {
"channel": 1
},
"errorLimit": {
"record": 0,
"percentage": 0.02
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "canal",
"password": "canal",
"column": [
"X",
"a",
"b"
],
"splitPk": "X",
"connection": [
{
"table": [
"xdual10"
],
"database": [
"test"
],
"jdbcUrl": [
"jdbc:mysql://172.28.3.26:3306/test"
]
}
]
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"print": true
}
}
}
]
}
}
ali的datax不支持binlog位点方式导出,一直想要这个功能,最开始想用sqoop来实现, 奈何是mr,实现了意义也不大,
参考DBZ 支持mysql一致性快照mysql导出,后端可以支持到其他存储。